Sampling from Dirichlet partitions: estimating the number of species
نویسندگان
چکیده
Consider the random Dirichlet partition of the interval into n fragments with parameter θ > 0. We recall the unordered Ewens sampling formulae from finite Dirichlet partitions. As this is a key variable for estimation purposes, focus is on the number of distinct visited species in the sampling process. These are illustrated in specific cases. We use these preliminary statistical results on frequencies distribution to address the following sampling problem: what is the estimated number of species when sampling is from Dirichlet populations? The obtained results are in accordance with the ones found in sampling theory from random proportions with Poisson-Dirichlet distribution. To conclude with, we apply the different estimators suggested to two different sets of real data.
منابع مشابه
A Species Sampling Model with Finitely Many Types
A two-parameter family of exchangeable partitions with a simple updating rule is introduced. The partition is identified with a randomized version of a standard symmetric Dirichlet speciessampling model with finitely many types. A power-like distribution for the number of types is derived.
متن کاملAsymptotics for the number of blocks in a conditional Ewens-Pitman sampling model
The study of random partitions has been an active research area in probability over the last twenty years. A quantity that has attracted a lot of attention is the number of blocks in the random partition. Depending on the area of applications this quantity could represent the number of species in a sample from a population of individuals or the number of cycles in a random permutation, etc. In ...
متن کاملCollapsed Gibbs Sampling for Latent Dirichlet Allocation on Spark
In this paper we implement a collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model on Spark. Spark is a fast in-memory cluster computing framework for large-scale data processing, which has been the talk of the Big Data town for a while. It is suitable for iterative and interactive algorithm. Our approach splits the dataset into P ∗ P partitions, shuffles a...
متن کاملSampling formulae arising from random Dirichlet populations
Consider the random Dirichlet partition of the interval into n fragments at temperature θ > 0. Some statistical features of this random discrete distribution are recalled, together with explicit results on the law of its size-biased permutation. Using these, pre-asymptotic versions of the Ewens and Donnelly-Tavaré-Griffiths sampling formulae from finite Dirichlet partitions are computed exactly...
متن کاملStirling number of the fourth kind and lucky partitions of a finite set
The concept of Lucky k-polynomials and in particular Lucky χ-polynomials was recently introduced. This paper introduces Stirling number of the fourth kind and Lucky partitions of a finite set in order to determine either the Lucky k- or Lucky χ-polynomial of a graph. The integer partitions influence Stirling partitions of the second kind.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008